Effective Use of the Level-Two Cache for Two Cache for Skewed Tiling (Extended Version)
نویسندگان
چکیده
Tiling is a well-known loop transformation technique to enhance temporal data locality. In our previous work, we have developed a skewed tiling technique for relaxation codes, which requires to apply loop skewing before loop tiling. In this paper, we study how to effectively usc the level-two cache for skewed tiling through a tile-size selection algorithm, STS. Particularly, we address two questions: (1) when to foclls on enhancing locality for the L2 cache instead of the Ll cache, and (2) how to improve the L2 cache locality such that the overall performance nm be improved. \Ve address the first question by developing an execution cost model which incorporates both the Ll and the L2 cach(~ misses. \Ve address the second question by applying inter-array padding to minimize cross-interference misses. We compare STS with several previonsly known algorithms. For certain test cases, STS is significantly better than those previolls algorithms because it effectively exploits the L2 cache locality. For other cases, STS achieves comparable results because it also effectively exploits the Ll cache locality. For two well-known SPEC benchmarks with different inputs on two different machines, we also compare our inter-array padding algorithm with a previously-proposed padding algorithm. Our padding algorithm is significantly bel.ter.
منابع مشابه
Impact of Tile-Size Selection for Skewed Tiling
Tile-size selection is known to be a complex problem. Thjs paper develops a new selecbion algorithm. Unlike previous algorithms, this new algorithm considers the effect of loop skewing on cache miss-. It also estimates loop overhead and incorporates them into the execution cost model, which turns out to be critical to the decision between tiling a single loop level vs. tiling two loop levels. O...
متن کاملImprove Replica Placement in Content Distribution Networks with Hybrid Technique
The increased using of the Internet and its accelerated growth leads to reduced network bandwidth and the capacity of servers; therefore, the quality of Internet services is unacceptable for users while the efficient and effective delivery of content on the web has an important role to play in improving performance. Content distribution networks were introduced to address this issue. Replicatin...
متن کاملDynamic tiling for effective use of shared caches on multithreaded processors
Simultaneous multithreaded (SMT) processors use data caches which are dynamically shared between threads. Depending on the processor workload, sharing the data cache may harm performance due to excessive cache conflicts. A way to overcome this problem is to physically partition the cache between threads. Unfortunately, partitioning the cache requires additional hardware and may lead to lower ut...
متن کاملModel-Driven Automatic Tiling with Cache Associativity Lattices
Traditional compiler optimization theory distinguishes three separate classes of cache miss – Cold, Conflict and Capacity. Tiling for cache is typically guided by capacity miss counts. Models of cache function have not been effectively used to guide cache tiling optimizations due to model error and expense. Instead, heuristic or empirical approaches are used to select tilings. We argue that con...
متن کاملA Stable and Efficient Loop Tiling Algorithm
Loop tiling is an effective optimizing transformation to boost the memory performance of a program, especially for dense matrix scientific computations. The magnitude and stability of the achieved performance improvements is heavily dependent on the appropriate selection of tile sizes. Many existing tile selection algorithms try to find tile sizes which eliminate self-interference cache conflic...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013